19 research outputs found

    CGDSNPdb: a database resource for error-checked and imputed mouse SNPs

    Get PDF
    The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the ‘imputed genotype resource’ in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600 000 SNPs and over 900 000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login

    An imputed genotype resource for the laboratory mouse

    Get PDF
    We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrate that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at http://cgd.jax.org/

    A customized and versatile high-density genotyping array for the mouse

    Get PDF
    We designed a high-density mouse genotyping array containing 623,124 SNPs that capture the known genetic variation present in the laboratory mouse. The array also contains 916,269 invariant genomic probes that are targeted to functional elements and regions known to harbor segmental duplications. The array opens the door to the characterization of genetic diversity, copy number variation, allele specific gene expression and DNA methylation and will extend the successes of human genome-wide association studies to the mouse

    Systematic variation in mRNA 3â€Č-processing signals during mouse spermatogenesis

    Get PDF
    Gene expression and processing during mouse male germ cell maturation (spermatogenesis) is highly specialized. Previous reports have suggested that there is a high incidence of alternative 3â€Č-processing in male germ cell mRNAs, including reduced usage of the canonical polyadenylation signal, AAUAAA. We used EST libraries generated from mouse testicular cells to identify 3â€Č-processing sites used at various stages of spermatogenesis (spermatogonia, spermatocytes and round spermatids) and testicular somatic Sertoli cells. We assessed differences in 3â€Č-processing characteristics in the testicular samples, compared to control sets of widely used 3â€Č-processing sites. Using a new method for comparison of degenerate regulatory elements between sequence samples, we identified significant changes in the use of putative 3â€Č-processing regulatory sequence elements in all spermatogenic cell types. In addition, we observed a trend towards truncated 3â€Č-untranslated regions (3â€Č-UTRs), with the most significant differences apparent in round spermatids. In contrast, Sertoli cells displayed a much smaller trend towards 3â€Č-UTR truncation and no significant difference in 3â€Č-processing regulatory sequences. Finally, we identified a number of genes encoding mRNAs that were specifically subject to alternative 3â€Č-processing during meiosis and postmeiotic development. Our results highlight developmental differences in polyadenylation site choice and in the elements that likely control them during spermatogenesis

    The BioMart community portal: an innovative alternative to large, centralized data repositories.

    Get PDF
    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

    A review of psychological approaches to working with older adults with personality disorders

    Get PDF
    Motivation: Cis-acting regulatory elements are frequently constrained by both sequence content and positioning relative to a functional site, such as a splice or polyadenylation site. We describe an approach to regulatory motif analysis based on non-negative matrix factorization (NMF). Whereas existing pattern recognition algorithms commonly focus primarily on sequence content, our method simultaneously characterizes both positioning and sequence content of putative motifs. Results: Tests on artificially generated sequences show that NMF can faithfully reproduce both positioning and content of test motifs. We show how the variation of the residual sum of squares can be used to give a robust estimate of the number of motifs or patterns in a sequence set. Our analysis distinguishes multiple motifs with significant overlap in sequence content and/or positioning. Finally, we demonstrate the use of the NMF approach through characterization of biologically interesting datasets. Specifically, an analysis of mRNA 3â€Č-processing (cleavage and polyadenylation) sites from a broad range of higher eukaryotes reveals a conserved core pattern of three elements. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.National Science Foudnation (DBI-0331497); National Institutes of Health (2 P20 RR16463, 1R01GM072706

    C. elegans sequences that control trans-splicing and operon pre-mRNA processing

    No full text
    Many mRNAs in Caenorhabditis elegans are generated through a trans-splicing reaction that adds one of two classes of spliced leader RNA to an independently transcribed pre-mRNA. SL1 leaders are spliced mostly to pre-mRNAs from genes with outrons, intron-like sequences at the 5â€Č-ends of the pre-mRNAs. In contrast, SL2 leaders are nearly exclusively trans-spliced to genes that occur downstream in polycistronic pre-mRNAs produced from operons. Operon pre-mRNA processing requires separation into individual transcripts, which is accomplished by 3â€Č-processing of upstream genes and spliced leader trans-splicing to the downstream genes. We used a novel computational analysis, based on nonnegative matrix factorization, to identify and characterize significant differences in the cis-acting sequence elements that differentiate various types of functional site, including internal versus terminal 3â€Č-processing sites, and SL1 versus SL2 trans-splicing sites. We describe several key elements, including the U-rich (Ur) element that couples 3â€Č-processing with SL2 trans-splicing, and a novel outron (Ou) element that occurs upstream of SL1 trans-splicing sites. Finally, we present models of the distinct classes of trans-splicing reaction, including SL1 trans-splicing at the outron, SL2 trans-splicing in standard operons, competitive SL1-SL2 trans-splicing in operons with large intergenic separation, and SL1 trans-splicing in SL1-type operons, which have no intergenic separation
    corecore